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Abstract 

The two-parameter dynamical replica theory (2-DRT) is applied to investigate retrieval properties 
of non-monotonic associative memory, a model which lacks thermodynamic potential functions. 2- 
DRT reproduces dynamical properties of the model quite well, including the capacity and basin of 
attraction. Superretrieval state is also discussed in the framework of 2-DRT. The local stability 
condition of the superretrieval state is given, which provides a better estimate of the region in which 
superretrieval is observed experimentally than the self-consistent signal-to-noise analysis (SCSNA) 
does. 

1 Introduction 

The Hopfield model has attracted interests of researchers in various fields, and enormous amount of 
studies, both numerical-experimental and theoretical, have been carried out on it. Of particular interest 
among them is the one on the model with non-monotonic units 0: Whereas the conventional model uses, 
as the output function / of a unit, monotonic functions such as f(x) = tanh/3x (/? > 0), the model with 
non-monotonic units, or the non-monotonic model for short, uses a non-monotonic function. It has been 
reported that the non-monotonic model has various nice properties as a model of associative memory, 
such as enhancement of storage capacity, enlargement of basins of attraction associated with retrieval 
states. 

In order to rigorously argue such properties of the non-monotonic model, theoretical analyses are 
necessary. However, attempts to analyze the non-monotonic model are often faced with difficulty because 
the non-monotonic model in general does not have a Lyapunov function, which would be a powerful 
analytical tool for characterizing the equilibrium as well as dynamical properties of the model. Thus, 
applicable theories are restricted to what have been devised for analyzing the conventional model and 
yet are independent of the functional form of the output function /. As for equilibrium analysis, the 
self-consistent signal-to-noise analysis (SCSNA) has been applied to the non-monotonic model[^| and 
some interesting properties, including the existence of the so-called superretrieval states, have been 
found. For retrieval dynamics, on the other hand, currently no exact and tractable theory has been 
known even for the conventional model: The path integral formalism || and Gardner-Derrida-Mottishaw 
theoryQ (for "zero-temperature," or (3 — » +oo case) are the exact theories for the asynchronous (or 
Glauber) and synchronous (or Little) dynamics, respectively, but computation of dynamics based on 
each of them is prohibitively difficult. It has been generally believed^, ||, that any tractable theories 
on retrieval dynamics necessarily incorporate some approximation. For the conventional model with 
the synchronous dynamics, Amari-Maginu theory|7| (for zero-temperature case; for extension to finite- 
temperature case, see ||) has been proposed as one of such theories, and Nishimori and Opri§|| have 
applied it to the non-monotonic case. As one of tractable theories for the conventional model with 
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Figure 1: Non-monotonic function /(#). 



the asynchronous dynamics, Coolen and Sherrington jl(J [TT|] have proposed the 2-parameter dynamical 
replica theory (2-DRT hereafter; it is also sometimes called the Coolen- Sherrington (CS) theory). Of 
course it is an approximate theory as confirmed, for example, by Ozeki and Nishimori|12j and Tanaka 
and Osawa |]l3| ; nevertheless, it describes retrieval dynamics of the conventional model reasonably well. 
As Ozeki and Nishimori[[l2"| have mentioned, formulation of 2-DRT does not depend on the functional 
form of the output function /, and therefore it is possible to apply it to the non-monotonic model. 
Thus, the following question naturally arises: How well does 2-DRT describe retrieval dynamics of the 
non-monotonic model? In this paper, we address this problem, with emphasis placed on the storage 
capacity, size of basins of attraction, and the superretrieval states. 



2 Model 

Let us consider a model with N units. Each unit has a binary variable s$ € {— 1, 1}, i = 1, . . . , N, and 
s = [si, . . . , sat] € { — 1, 1}^ represents a microscopic state of the model. Each unit stochastically and 
asynchronously updates the value of s, based on the current value of the "local field" 

H s ) =^2 J ij s i: (!) 

where Jij is a synaptic weight from neuron j to neuron i. The probability of state flip s, := — s, is given 
by the following transition probability 

u*(«) = i(l -*/(*<(«))), (2) 

where / : R i— > [—1, 1] is the output function. Taking f(x) = tanh/fo yields the conventional Hopfield 
model. In this paper we consider the following non-monotonic function (Fig. [l]): 

/(») = ( ~ s f^ (M ^ J) (3) 
J w \ sgn(x) (|x| < 0) w 

Its functional form is the same as that treated by Nishimori and Opris.0. Throughout the paper, we 
follow the common convention about the time scale, that it is taken in such a way that the average 
number per unit time (frequency) of updates each unit executes is 1. 

The model memorizes p = aN binary patterns £ M = . . . , £ {—1, 1} , [i= 1, . . . , p, via the 
Hebb rule, 

The quantity a = p/N is called the memory rate. We consider the case where the patterns to be memo- 
rized are randomly generated, that is, each of £f 's takes the value ±1 with probability 1/2, independently 
of others. 



For measuring how well the model retrieves a pattern fi, the correlation, or overlap, 



i=l 

is used: |m p (s)| < 1 holds by definition, and if m^(s) = 1, then the model is in the state s = and it 
is regarded as perfectly retrieving the pattern /i. When m^(s) = — 1 the model is in the state s = — 
The state is called the reversal state, but it can also be regarded as retrieving the pattern /i due to the 
symmetry of the model. 

We assume that the model is going to retrieve one single pattern, so that m^(s) are of order unity for 
that pattern only (the condensed ansatz), and that pattern fi = 1 is nominated for retrieval, without loss 
of generality. Then m = m 1 ^) can be taken as a macroscopic variable, or order parameter, describing 
how well the model retrieves the nominated pattern. If the model reaches an equilibrium with m/0, 
the model is said to successfully retrieve the pattern, and such an equilibrium is called a retrieval state. 
The local field hi(s) is now rewritten as 

h i (s)=£[m + z i (s)}-±s i , SiW^X^E^i" (6) 

The term represents the interference in the local field from non-nominated patterns fi > 1, and is 
called the "noise" term. Although time evolution of the microscopic state of the model is stochastic in 
nature, it is observed that the evolution of the order parameter m in the course of pattern retrieval can 
be often seen as being governed by a certain deterministic law. To describe this retrieval dynamics is 
one of important problems in this field. Path integral formalism || provides an exact description to the 
retrieval dynamics, but it requires parametrization with infinite degrees of freedom and hence practically 
intractable. 



3 Two-parameter dynamical replica theory (2-DRT) 

2-DRT, proposed by Coolen and Sherrington]lO|, pd| , provides a tractable, and yet reasonably well for 
the conventional model, description of the retrieval dynamics. It uses two parameters, m and r, as the 
order parameters, the latter being defined as 

r^i]T(TO^)) 2 , (7) 

n>i 

which intuitively represents the degree of interferential effect of non-nominated patterns fi > 1 onto 
the retrieval of pattern 1. 2-DRT derives deterministic flow equations for these two order parameters. 
Without any assumptions, one cannot expect that the flow equations for these two parameters are closed, 
and thus the time evolution of m and r cannot be completely determined by current values of them. In 
the framework of DRT in general, the following two assumptions are made: 

1. Self- averaging of flow equations with respect to randomness of the system (randomly chosen pat- 
terns for the case treated in this paper). 

2. Probability equipartitioning within subshells: when values of the order parameters are given, the 
probability distribution of the corresponding microscopic state can be regarded as being uniform 
over the subshell (the set of microscopic states which have the specified values of the order param- 
eters), with regard to calculation of the flow equations. 

Owing to these closing assumptions, one can derive the deterministic flow equations for m and r: 

J dz D m r [z] f(m + z) — m 

— I dz D m r [z] zf(m + z) + 1 — r 
a J 



dm 
~~dt 
Idr 
Ydt 



where D m , r [z] is the distribution of the noise terms. Replica calculation gives, within the replica- 
symmetric (RS) ansatz, the RS solution r [z] for the noise distribution, which has been derived by 
CS|Hj[ [HJ as 
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where Dy = (dy / 'y/2Tr)e a / 2 is the Gaussian measure. The parameters {q, A, p, p} are to be determined 
from m, r, and a by the following saddle-point equations: 
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/ tanh(Ay + p), 1 — I Dy tanh 2 (Ay + p) 



(11) 



It should be noted that the replica calculation of the noise distribution bears no relation to the 
dynamics of the model and the choice of the output function /; it executes averaging over the (to, r)- 
subshell with uniform measure, and therefore the calculation is not dynamical but configurational. This 
is the reason why, apart from validity of the two assumptions, 2-DRT can be straightforwardly applied 
to the non-monotonic model. 

The latter of the two above-mentioned assumptions, the equipartitioning assumption, is a critical one 
because it has been known that it is not valid for the conventional modelJl3||, as well as the continuous- 
valued linear system, or the Langevin spin systemjyj. To show this directly for the conventional model, 
Tanaka and OsawaJl3[ have proposed a dynamics, called (m, r)-annealing. The (m, r)-annealing is 
defined, on the basis of the solution {p, p} of the saddle-point equations for given m and r, as the 
dynamics of the conventional model with the inverse temperature p (that is, it uses f(x) = tanh pa;), but 
an extra bias is added to the local field, 



hi(s) 



(12) 



where b = p/p — m. It has been shown that the (m, r)-annealing executes Monte Carlo sampling from a 
(to, r)-subshell with uniform probability (in the limit N — > oo), hence realizing the equipartitioning. It 
is useful in investigating validity of the equipartitioning assumption, and will be utilized in this paper. 



4 Results 

4.1 Time evolution of order parameters 

We first examined whether 2-DRT describes overall characteristics of the dynamics of the non-monotonic 
model. We assume the common convention that initial microscopic states of the model are given by 
randomly corrupting the nominated pattern; that is, the initial state s is set by the probability law 
Prob[si] = S(si — £l)(l + mo)/2 + 8(si + £*)(1 — m )/2, so that the initial overlap m(t — 0) approximately 
equals to mo when N is sufficiently large. Following this initialization procedure, the initial value of 
r, when N is sufficiently large, approximately equals to 1. We found that 2-DRT does describe the 
dynamics of the model considerably well, as shown in Fig. || for the case with a — 0.2 and 9 — 1.4. 
2-DRT reproduced the trajectories almost exactly when retrieval succeeded. Noticeable disagreement 
between simulations and 2-DRT was seen for cases where retrieval failed. Characteristic aspect of the 
disagreement is that the trajectories predicted by 2-DRT exhibit, in early stages, overall slowing down 
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Figure 2: Dynamics obtained by simulations with TV = 2 15 (solid) and computed by 2-DRT (dashed) for 
the case with a — 0.2 and = 1.4. 
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Figure 3: Flow vectors (to, f) at t = 1 on the trajectory starting at (m, r) = (0.1, 1) for the model with 
a = 0.2, 9 = 1.4 and TV = 2 15 . Three flow vectors, ones before and after the (m, r)-annealing, and one 
computed by 2-DRT, are shown. 




Figure 4: Plot of a e , the maximum of a for which a retrieval state exists (solid), and critical capacity ao 
(dashed), under which superretrieval occurs, against 9, evaluated by SCSNA calculation. Thin dashed 
line shows a = 9, which limits the capacity in the small-0 region. 



against the corresponding simulations. These observations are essentially the same as those found in the 
conventional model|ll], g, The observed disagreement is due to the failure of the equipartitioning 
assumption of 2-DRT just as in the conventional model, as demonstrated by the following numerical 
experiment. The procedure of this experiment is as follows: Simulate a model with 9 = 1.4 and a = 0.2 
by setting its initial condition as (m, r) = (0.1, 1), stop it at t — 1, and then execute the (m, r)-annealing 
for 80 units of time. The flow vectors (m, r), evaluated from (1) the model just before executing the (m, 
r)-annealing, (2) the model just after it, and (3) 2-DRT, were compared, and the result is summarized 
in Fig. ||. This shows that the flow vectors of the model after the (m, r)-annealing and of 2-DRT are 
almost the same, whereas the flow vector before the (m, r)-annealing is different from these two, which 
means that the equipartitioning assumption does not hold in this case. 



4.2 Capacity and basins of attraction 

In this paper, we define the storage capacity a c as the maximum value of a for which a stable macroscopic 
state with nonvanishing m exists. We call an equilibrium macroscopic state with nonvanishing m the 
retrieval state. Since the retrieval state may be unstable, the condition for the existence of a retrieval 
state will give an overestimate of the true storage capacity. On the other hand, a stable retrieval state 
may have a very small basin of attraction, so that we may fail to find such a retrieval state in numerical 
experiments even though it is stable. 

Shiino and Fukai|Q] have analyzed equilibrium properties of continuous- valued continuous-time non- 
monotonic models using SCSNA. Although we consider the model with binary variables in this paper 
rather than ones with continuous values, the equilibrium condition is shown to be the same as that for the 
continuous- value continuous-time model owing to the current choice of the output function / (eq. @), 
and thus SCSNA can be applied to the model treated here. We executed the SCSNA calculation on this 
model, and Fig. |] shows the result for a e , the maximum of a for which a retrieval state exists, versus 9. 
When 9 — > oo, a e approaches the well-known value 0.138, confirming that SCSNA is consistent with the 
Amit-Gutfreund-Sompolinsky (AGS) theory jl5|]. As 9 becomes smaller, a e increases so that it reaches 
its maximal value a e = 0.489 at 9 w 0.7. 

As we have already discussed, a e gives an overestimation of the true storage capacity a c |^|. To 
evaluate a c itself, we have to take into account the dynamical aspect of the retrieval process. This 
discussion leads us to the idea to apply 2-DRT to determine the storage capacity a c , by observing 
whether or not the trajectories from arbitrary initial conditions approach a retrieval state. 

Two points have to be mentioned here. First, although 2-DRT becomes exact at equilibrium for the 
conventional model]l0|, pjj , it is no longer so for the non-monotonic model, which means that 2-DRT may 
not reproduce the storage capacity. Second, since we assume the initialization procedure described above, 




Figure 5: Plot of storage capacity a c against 9, evaluated by simulation (solid) and by 2-DRT trajectory 
tracking (dashed). Plots of a e and ao evaluated by SCSNA (shown in Figure ^) are also shown for 
comparison. 



only the retrieval states which can be reachable from initial conditions with r = 1 are to be observed 
in the simulations. To make correspondence with this experimental setup, we estimated the storage 
capacity a c by tracking 2-DRT trajectories from initial conditions with r = 1. The storage capacity 
estimated by the simulations and by the 2-DRT trajectory tracking may be therefore an underestimate 
against the true storage capacity, bec aus e there might be stable retrieval states unreachable from any 
initial condition with r = 1 (see Sect. 4.3). 

Figure]^ shows the estimated storage capacity a c by 2-DRT trajectory tracking and by simulations. 
It can be seen that 2-DRT trajectory tracking well reproduces the storage capacity obtained by the 
simulations for the whole range of 9. Comparing it with Fig. ^ reveals that, while the agreement is good 
when 9 is large, the discrepancy becomes apparent as 9 becomes less than about 1.5, showing that the 
storage capacity estimated by 2-DRT is considerably smaller than that estimated by SCSNA. It may be 
partly explained by the overestimation of SCSNA described above. 

One of the main advantage of dynamical theory is that it allows us to evaluate basins of attraction, 
because they are essentially of dynamical nature. We used 2-DRT to evaluate the basin of attraction 
of the retrieval states. Since we adopt the above-mentioned initialization convention, we consider the 
basin of attraction as being represented in terms of m only, that is, we regard a value of mo as belonging 
to the basin of attraction of the retrieval state if the trajectory starting at the state (m, r) — (mo, 1) 
approaches the retrieval state. Evaluating the basin of attraction defines the critical initial overlap m c , 
which means that initial states (mo, 1) with tuq > m c yield successful retrieval. Figures |^ and ^ show 
the critical initial overlap m c and the values of m of the retrieval state, respectively, evaluated by 2-DRT 
and by simulations. It is readily seen that enlargement of the basin of attraction occurs as 9 becomes 
small, and that 2-DRT captures this phenomenon reasonably well. 



4.3 Superretrieval states 

As a result of SCSNA analysis on non-monotonic models, Shiino and Fukai(2| have shown that there 
is a phase where equilibrium states corresponding to "perfect" retrieval exist. Such states are called 
the superretrieval states, whose existence is one of unique features of the non-monotonic models. Here, 
"perfect" means that the correlation of the sign of the local field hi(s) (not of s) with the nominated 
pattern £ x is exactly equal to ±1. The correlation defined as above is called the tolerance overlap^. The 
critical capacity ao, below which the superretrieval occurs, can be evaluated numerically by SCSNA, 
and is also shown in Fig. [I| 

An explanation, given by Shiino and Fukaij^], for possibility of such states is briefly as follows: For 
such states tags ~~ > +0 holds, which has been confirmed by numerically solving relevant self-consistent 
equations. Since variance of the noise term (without the "systematic" term, FY , in their terminology^) 
is given by arAGS in SCSNA, it implies that the effect of the noise completely vanishes in the states, 
which enables the superretrieval. 
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Figure 6: Plot of critical initial overlap m c against a, evaluated by 2-DRT (solid line) and by simulation 
(markers) for 6 = +oo, 1.4 and 0.7. 
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Figure 7: Plot of the value of m of the retrieval state against a, evaluated by 2-DRT (solid line) and by 
simulation (markers) for 6 = +oo, 1.4 and 0.7. 
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Figure 8: Time evolution of r for 9 = 0.4, a = 0.05, and with initial condition too = 0.9, evaluated by 
simulation with N = 2 15 (solid) and by 2-DRT (dashed). 
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Figure 9: Noise distribution D[z] at the equilibrium state ((to, r) = (0.398, 0.00440)) achieved by the 
simulation with 9 = 0.4, a — 0.05, N = 2 15 , and initial condition to = 0.9 (solid), and the one computed 
by 2-DRT for the same values of (to, r) (dashed). 




Figure 10: Plot of 1/r versus hit for 9 = 0.4, a = 0.05, and with initial condition to = 0.9, evaluated 
by 2-DRT. 



Especially interesting is whether or not 2-DRT provides an appropriate description of dynamics which 
is bound for superretrieval states. To see this, we examined the case where 9 = 0.4 and a = 0.05, which, 
according to SCSNA analysis, is expected to have the superretrieval states. Figure || shows time evolution 
of r evaluated by simulation and by 2-DRT, with initial condition too = 0.9. Again, 2-DRT reproduced 
the simulation result fairly well. For the simulation, state transition ended (confirmed numerically) at 
t w 30, where (to, 7-) = (0.398, 0.00440). The tolerance overlap was evaluated to be exactly equal to 
1 at this state, indicating that this is the superretrieval state. Figure ^ shows the noise distribution 
D[z] at the equilibrium state achieved by the simulation, and the one computed by 2-DRT for the 
same values of (to, r). They are in good agreement, suggesting that 2-DRT can successfully predict the 
trajectories even in the case of superretrieval, provided that the system size N is sufficiently large. As 
for 2-DRT, the value of r continued to decrease at t w 10 5 (we stopped computation of 2-DRT at t = 10 5 
because rounding errors became profound beyond this point), where (to, r) = (0.399, 0.00159). We 
observed numerically that ^-dependence of the decrease of r can be expressed, to a good approximation, 
as r(t) oc 1/lnt (Fig. [n]), which strongly supports the conjecture that the trajectory obtained by 2-DRT 
certainly approaches r = 0. As can be seen by eq. (||), variance of the noise term is, roughly speaking, 
given by ar in 2-DRT. Then, if the superretrieval states are described appropriately by 2-DRT, they 
should correspond to the states with r = +0. We therefore investigate in the following the solutions of 
the saddle-point equations (|ll|) when r = +0. 

By formally taking the limit r — ► +0, the saddle-point equations (|ll|) are reduced to the following 
equations 



p = — oo 
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The condition that the macroscopic state (to, r — +0) is an equilibrium state, that is, (to, f) = (0, 0) 
holds, is thus given by 

/(m±a)=Tl. (18) 

For the current choice of the function / (eq. (^|)), this condition is satisfied when < m — a < 9 < m+a, 
or equivalently, max{# — a, a} < to < min{# + a, 1}. This is in consistent with the observation from 
the simulations that to tends to approach 9 when the superretrieval occurs. 

The applicability of the argument above depends not only on the two assumptions made at the 
beginning but also on the two following points: The first one concerns the so-called freezing line, which 
defines the points in the (to, r) plane where the number of microscopic states within the (to, r)-subshell 
changes from an exponentially large number to an exponentially small number in terms of N. The second 
one concerns the so-called de Almeida- Thouless (AT) linejl8), at which the replica-symmetry-breaking 
(RSB) occurs and thus the RS ansatz becomes no longer valid. 

The freezing line is given, under the RS ansatz, by the following equation [hi): 
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The number of microscopic states within the subshell is exponentially large, and therefore taking averages 
over the subshell is expected to have the proper meaning, as long as the left-hand side of eq. ( |l9| ) is 
positive. For r = 1 and |m| < 1 the left-hand side becomes 
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Figure 11: The freezing line (solid) and the AT line (dashed) for a = 0.05. 



so that the number of microscopic states is indeed exponentially large. In the limit r — > +0, however, 



the left-hand side of eq. (19) goes to — oo, implying that r = is outside the freezing line. This means 
that there are so few microscopic states near r = 0, which may in part explain the observation from the 
simulations that the trajectories approaching r = reach equilibrium before actually arriving at r = 0. 
That r = is outside the freezing line also means that the argument with the formal limit r — > +0 
eventually loses its proper meaning, because the relevant subshell average is over an exponentially small 
number of microscopic states. Numerical evaluation reveals, however, that the freezing line for r < 1 
lies very close to r = (Fig. [LT| ). Moreover, the saddle-point solution of the order parameters changes 
smoothly as r tends to +0: Let /i , <7o be the asymptotic values of the saddle-point solution ji, q, 
respectively. Then the asymptotic form of the saddle-point solution as r — > +0 is given by 

P = 
A = 




M = 

q = 

A = -a + 0{r). (21) 

We can therefore expect that the argument presented above on the formal limit r — > +0 well captures 
qualitative aspects of the equilibrium states achieved by the simulations. 

The AT line is determined by examining stability of the RS solution. Assuming that RSB is caused 
by destabilization of the so-called "replicon" modes [jl6], [l7|, [l8) for the case r < 1 just as it has been 
assumed for the case r > 1 |ll| , the AT line turns out to be given by the same formula as given in |ll| 
for r > 1: 

^ ^ ^ I cosh 4 (Ay + (i) ^ ^ 



The RS solution is valid if the right-hand side of eq. (22) is less than a. We confirmed that, for the case 
where a — 0.05, the AT line lies in the region m > 0.890 for r < 1 (Fig. |ll|), which implies that the 
superretrieval observed in the simulations was irrelevant to RSB. 

We have conducted local stability analysis of the 2-DRT stationary solutions r = 0, max{6> — a, a} < 
m < min{# + a, 1} corresponding to the superretrieval state. The result of the analysis states that, 
among the superretrieval solutions, (rn, r) = (6, 0) is the only attractor when 2a < 9 < 1, although it 
becomes unstable when a < 8 < min{2a, 1}. For the details of the local stability analysis, see Appendix. 
In the simulations, however, it is not for all (9, a) values satisfying 2a < 9 < 1 that the superretrieval 
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Figure 12: The region where the superretrieval solution (m. r) — (9, 0) is stable, evaluated by 2-DRT 
local stability analysis (shaded region), by tracking simulation trajectories (region below thick solid curve) 
and by tracking 2-DRT trajectories (region below thick dashed curve). The region where superretrieval 
state exists, evaluated by SCSNA (see Fig. 0) is also shown (dotted) for comparison. 




Figure 13: Trajectories predicted by 2-DRT in (m, r) plane with (9, a) — (0.4, 0.15). 



was observed. Figure [12] shows the region where the local stability analysis of the 2-DRT predicts the 
stable superretrieval state, the one where the superretrieval is observed by simulations (in the sense that 
tolerance overlap is numerically evaluated to be 1), and the one where the superretrieval is predicted by 
tracking the 2-DRT trajectories starting at (m, r) = (mo, 1). The result of 2-DRT trajectory tracking 
and that of simulations are in good agreement with each other. Both are within the region where the 
superretrieval state is locally stable, as they should be, but apparently they do not coincide. The region 
where the superretrieval is observed in simulations may be further restricted by the following factors: 

• The superretrieval solution (m, r) — (9, 0) may not be reachable from the conventional initial 
states with r = 1, even though it is an attractor. 

• The superretrieval solution (m, r) = (9, 0) may be at the outside of the RS region, where the 
stationarity and local stability arguments, both based on the RS ansatz, are no longer valid. 



A demonstration regarding the former factor is shown in Fig 
(marked by a cross in Fig 



3[ For the condition (9, a) = (0.4, 0.15) 
12), for example, the superretrieval state is not observed by following time 



evolution by either numerical simulation or 2-DRT. Nevertheless, 2-DRT predicts that under this condi- 
tion the stable superretrieval state exists at (m, r) = (0.4, 0). As shown in the figure, 2-DRT trajectory 
tracking shows that in this condition the superretrieval state is indeed stable, but it is not reachable 
from the initial states with r = 1. 

An interesting observation related to the latter factor is that there is a rough numerical correspondence 
between the region where the equilibrium superretrieval solution (m, r) = (9, 0), a < 9 < 1, which may 




Figure 14: The region where, according to 2-DRT, the superretrieval solution (m, r) — (6, 0) is a 
stationary point and satisfies the RS ansatz (solid) and the superretrieval phase evaluated by SCSNA 
(dashed). 



not be stable, satisfies the RS ansatz, and the region where SCSNA predicts superretrieval to occur, as 
shown in Fig. 

5 Conclusion 

We have studied the question of how well 2-DRT describes retrieval dynamics of the non-monotonic 
model. Although there is no theoretical justification for 2-DRT to be exact either for the non-monotonic 
model, 2-DRT turns out to reproduce the retrieval dynamics quite well, and it gives reasonable results 
as for the capacity, basins of attractions, and the superretrieval states. 



References 

[1] M. Morita. Associative memory with nonmonotone dynamics. Neural Networks, 6:115-126, 1993. 

[2] M. Shiino and T. Fukai. Self-consistent signal-to-noise analysis of the statistical behavior of analog 
neural networks and enhancement of the storage capacity. Phys. Rev. E, 48:867-897, 1993. 

[3] H. Rieger, M. Schreckenberg, and J. Zittartz. Glauber dynamics of the Little-Hopfield model. Z. 
Phys. B — Condensed Matter, 72(4):523-533, 10 1988. 

[4] E. Gardner, B. Derrida, and P. Mottishaw. Zero temperature parallel dynamics for infinite range 
spin glasses and neural networks. J. Physique, 48(5):741-755, 5 1987. 

[5] H. Horner, D. Bormann, M. Frick, H. Kinzclbach, and A. Schmidt. Transients and basins of 
attraction in neutral network models. Z. Phys. B — Condensed Matter, 76(3):381-398, 9 1989. 

[6] M. Okada. A hierarchy of macrodynamical equations for associative memory. Neural Networks, 
8(6):833-838, 1995. 

[7] S. Amari and K. Maginu. Statistical neurodynamics of associative memory. Neural Networks, 
l(l):63-73, 1988. 

[8] H. Nishimori and T. Ozeki. Retrieval dynamics of associative memory of the Hopfield type. J. Phys. 
A: Math. Gen., 26(4):859-871, 2 1993. 

[9] H. Nishimori and I. Opri§. Retrieval process of an associative memory with a general input-output 
function. Neural Networks, 6:1061-1067, 1993. 



[10] A. C. C. Coolen and D. Sherrington. Dynamics of fully connected attractor neural networks near 
saturation. Phys. Rev. Lett, 71(23):3886-3889, 12 1993. 

[11] A. C. C. Coolen and D. Sherrington. Order-parameter flow in the fully connected Hopfield model 
near saturation. Phys. Rev. E, 49(3):1921-1934, 3 1994. 

[12] T. Ozeki and H. Nishimori. Noise distributions in retrieval dynamics of the Hopfield model. J. 
Phys. A: Math. Gen., 27(21):7061-7068, 11 1994. 

[13] T. Tanaka and S. Osawa. On macroscopic description of recurrent neural network dynamics. J. 
Phys. A: Math. Gen., 31(18):4197-4202, 5 1998. 

[14] A. C. C. Coolen and S. Franz. Closure of macroscopic laws in disordered spin systems: a toy model. 
J. Phys. A: Math. Gen., 27(21):6947-6954, 11 1994. 

[15] D. J. Amit, H. Gutfreund, and H. Sompolinsky. Storing infinite numbers of patterns in a spin-glass 
model of neural networks. Phys. Rev. Lett., 55(14):1530-1533, 9 1985. 

[16] D. J. Amit, H. Gutfreund, and H. Sompolinsky. Statistical mechanics of neural networks near 
saturation. Ann. Phys., 173(l):30-67, 1 1987. 

[17] K. H. Fischer and J. A. Hertz. Spin glasses, volume 1 of Cambridge studies in magnetism. Cambridge 
university press, Cambridge, 1991. 

[18] J. R. L. de Almeida and D. J. Thouless. Stability of the Shcrrington-Kirkpatrick solution of a spin 
glass model. J. Phys. A: Math. Gen., ll(5):983-990, 1978. 

A Local Stability Analysis of Superretrieval States 

We first split the RS noise distribution Z?^ s r [z] into two components, as follows: 

D^ s r [z}^D-[z} + D + [z} (23) 

g -(A± 2 ) 2 /2ar 

^[^^Tl^^W ( 24 ) 

zv Iirar 



. 1 ' 2 

w±(z) = 1—1 Dy tanh 



\ par J r 



Note that w ± (z) is "slowly varying" with respect to z, because 

dw ± (z) 



dz 
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^AGS 
P 



(25) 



(26) 



holds and the bound |/wagsA1 remains finite even when r — ► +0. We can thus regard that each 
component is basically a gaussian distribution centered at z = =fA and width 0{^far), and it has been 
modulated by a bounded, monotonic, and slowly varying function < w±(z) < 2. From the asymptotic 
form of the saddle-point solution as r — ► +0 (eq. (pl"l)), we can expect, for small r, that the noise 
components D^[z] become sharply peaked around z = =pA « ±a. In the limit r — > +0, we have 

D ± [z} = ^^6(z T a), (27) 

so that the condition 

max{6* — a, a} < m < mm{6 + a, 1} (28) 



is obtained for the existence of equilibrium states of the form (m, r) = (m, 0), as discussed in Sect. O 



In this section we analyze local stability of the equilibrium states (m, r = 0) satisfying the condition 
1). Using the noise components, the time evolution equations are rewritten as 



1.1 

2 r ~ a 



dz D [z]f(m + z)+ / dz D 1 [z]/(to + z) — to 
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dz D~[z] zf(m + z) + / dz D + [z] zf(m + z) 



1-r. 



(29) 



Because -D^z] are sharply peaked, as the first step of approximation we can assume that /(to + z) = 
f(m =F A) in the integrals with _D ± [z]. This assumption becomes exact in the limit r — > +0 and 
when /(to + z) is continuous around z — ±a, but for finite r it gives an approximate result and the 
approximation error comes from the contribution of the tails of D [z] where /(to + z) changes the sign. 
For explanation purposes we introduce the following four regions: 



I 
II 

III = 

IV = 

/(to + z) = 1 for z e I or III, and /(to + z) 



{z | to + z < -9} 
{z | —6 < to + z < 0} 
{z \ < m + z < 8} 
{z | 6 < to + z} 



(30) 



T for z £ II or IV. The equilibrium states which we are 
interested in correspond to the case where the peak of -D + [z] is in the region IV and that of -D - [z] in 
the region III. In this case the time evolution equations are approximated to be 
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However, direct calculation shows that the right-hand sides of these equations exactly equal to 0. This 
fact indicates that the time evolution near r — should be governed by the contribution of the tails. 
The principal contribution comes from the largest one of the following three quantities: 

1. Contribution of the tail of -D + [z] in the region III: 



h=2 dzD+[z}a(z) 
Jiu 



2 / dzD-[z]a(z) 

IV 



2. Contribution of the tail of D [z] in the region IV 

h 

3. Contribution of the tail of -D - [z] in the region II: 

h = -2 dzD-[z]a{z) 
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where zq = 6 — m, and a(z) = 1 or z, depending on which of to and r we are considering. We approximate 
each contribution by extending the integral region to oo or — oo. In fact this approximation does not 
affect the final result in the r — > +0 limit because it changes each quantity by a vanishingly small amount. 
First let us consider the contribution to to. Evaluation for 1\ yields 
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Similarly, for I 2 and ^3, we have 
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respectively. In the r — ► +0 limit, the dominant contribution comes from the first term of the exponent 
for each case, so that comparison of the term is sufficient to determine which of I\, I 2l and I 3 has the 
largest contribution to to. The result of the comparison for small r is summarized as follows: 

• When — 2A < 6, the largest contribution comes from I\ or I 2 . If to < 9, 1\ is the largest and 
to > 0. Otherwise, I 2 is the largest and to < 0. 

• When — 2 A > 0, the largest contribution comes from I 2 or I 3 . Since both I 2 and I 3 have negative 
contribution, to < 0. 

From this result, we can conclude that the stable superretrieval state, if it exists, should be (to, r) = 
(9, 0), and that 2a < 9 < 1 is a necessary condition for the existence of the stable superretrieval state. 

Let us now take a closer look at the flow near the state (to, r) = (9, 0). We let e = — zq = m — 9, 
and consider time evolution of the two small quantities, e and r. In the following arguments we assume 
that — 2A < 9 holds, so that the principal contribution comes from I\ or I 2 , but not from I3. Under this 
assumption, we have 
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where 



S = ^(w+(z ) -w-(zo)). 



(39) 



Note that —I < S/w < I holds. 

Assuming that e/r remains finite, we can readily see that f is smaller in magnitude than e by a 
factor r. Then for small enough r the slaving principle applies and e is expected to relax toward its 
equilibrium value much faster than r. This justifies the adiabatic approximation, and we can regard that 
the equilibrium condition for e, 

tanh — + 4=0, (40) 
ar w 

holds throughout the dynamics. This is indeed consistent with the assumption that e/r remains finite, 
f is then given by 

~~ >(aO['-(D*r<°- 



2ar 



This shows that the state (to, r) = (9, 0) is actually a stable point of the dynamics described by 2-DRT 
under the condition 2a < 9 < 1. 



